Leveraging locality for topic identification of conversational speech
نویسنده
چکیده
We evaluate the limitations of the bag-of-words assumption for topic identification of conversational discourse by examining whether topic-dependent word occurrence statistics are also position-independent. We demonstrate where the assumption is violated in conversational speech corpora and show how the relevance of words to the classification task decreases over the length of the document. We seek to improve topic identification by modeling this topic drift phenomenon and weight word counts according to a decay function over the length of the document. By applying a global decay rate for all words we observe reduction in error rates of 23-47% relative on conversational corpora. Furthermore, we apply a minimum classification error (MCE) training procedure to learn per-word decay rates, and reduce error rates by up to an additional 27%.
منابع مشابه
Confidence-Based Techniques for Rapid and Robust Topic Identification of Conversational Telephone Speech
We investigate the impact of automatic speech recognition errors on the accuracy of topic identification in conversational telephone speech. We present a modified TF-IDF featureweighting calculation that provides significant robustness under various recognition error conditions. For our experiments we take conversations from the Fisher corpus to produce 1-best and lattice outputs using one reco...
متن کاملTechniques for rapid and robust topic identification of conversational telephone speech
In this paper, we investigate the impact of automatic speech recognition (ASR) errors on the accuracy of topic identification in conversational telephone speech. We present a modified TF-IDF feature weighting calculation that provides significant robustness under various recognition error conditions. For our experiments we take conversations from the Fisher corpus to produce 1-best and lattice ...
متن کاملA Boosting Approach to Topic Spotting on Subdialogues
We report the results of a study on topic spotting in conversational speech. Using a machine learning approach, we build classifiers that accept an audio file of conversational human speech as input, and output an estimate of the topic being discussed. Our methodology makes use of a wellknown corpus of transcribed and topic-labeled speech (the Switchboard corpus), and involves an interesting do...
متن کاملTopic Identification from Audio Recordings Using Rich Recognition Results and Neural Network Based Classifiers
This paper investigates the use of a Neural Network classifier for topic identification from conversational telephone speech, which exploits rich recognition results coming from an automatic speech recognizer. The baseline features used to feed the neural classifier are produced using the words extracted from the 1-best sequence. Rich recognition results include the word union of the first n-be...
متن کاملTopic Learning in Text and Conversational Speech
Topic Learning in Text and Conversational Speech
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013